3D Computer Vision (2024/25)¶
Exercise¶
Upload: 30.10.2024 (11:30)
Deadline: 16.01.2025 (23:59)
Group 07¶
- Daiana Tei
- Ahsan Aftab
- Dhruv Kale
By submitting this exercise, we confirm the following:
- All people listed above contributed to this solution
- No other people were involved in this solution
- No contents of this solution were copied from others (this includes people, large language models, websites, etc.)
3D Scene Reconstruction¶
Task Overview¶
Your task in this exercise is to do a dense reconstruction of a scene. This will involve multiple steps that you will encounter and learn about as the semester progresses. You can start implementing individual steps as soon as you learn about them or wait until you have learned more to implement everything together. In the latter case, be mindful that this exercise is designed for an entire semester and the workload is accordingly large.
You will be given the following data:
- 9 color images of the scene.
- 8 Bit RGB per pixel.
- Each image rendered from a different position.
- The camera used had lens distortion.
- 9 Depth images of the scene.
- 8 Bit Grayscale per pixel. The result of dividing the Z-depth by each image's maximum and then multiplying by 255.
- Each image has the same pose as the corresponding RGB image.
- The camera used was free of any distortions.
- 1 Dictionary containing camera calibration parameters.
- They belong to the camera that was used to render the RGB images.
- Distortion coefficients are given in the standard [k1, k2, p1, p2, k3] order.
- 1 Numpy array containing 8 camera transformations.
- They specify the movements that the camera went through to render all images.
- I.e. idx 0 specifies the transformation from 00.png to 01.png, idx 1 specifies the transformation from 01.png to 02.png, ...
- This applies to both RGB and Depth images, as they have the same poses.
- 1 Numpy array containing 7 features.
- The features are specified for each of the 9 images.
- Each feature is a 2D pixel location in "H, W" order, meaning the first value is the height/row in the image and the second width/column.
- If a feature was not visible, it was entered as [-1, -1].
- The features are unsorted, meaning that feature idx 0 for 00.png could be corresponding to e.g. feature idx 4 for 01.png.
Solution requirements¶
- Your code needs to compile, run, and produce an output.
- Your target output should be a dense point cloud reconstruction (without holes) of the scene.
- The output should be in the .ply format. We provide a function that can exports a .ply file.
- You may inspect your .ply outputs in e.g. Meshlab.
- See the 'Dense Point Cloud' sample image to get an idea of what is possible. (Meshlab screenshot with point shading set to None)
- Your code should be a general solution.
- This means that it could run correctly for a different dataset (with same input structure).
- It should NOT include anything hardcoded specific to this dataset.
- Your code should not be unnecessarily inefficient.
- Our sample solution runs in less than 2 minutes total (including point cloud export).
- If your solution runs for more than 10 minutes, you are being wasteful in some part of your program.
Environment setup¶
!python --version
Python 3.11.11
!pip install numpy
!pip install pandas
!pip install matplotlib
!pip install scikit-learn
!pip3 install opencv-python
!pip install scipy
!pip install torch
!pip install torchvision
!pip3 install open3d-python
Requirement already satisfied: numpy in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (2.2.1) [notice] A new release of pip is available: 23.2.1 -> 24.3.1 [notice] To update, run: pip install --upgrade pip Requirement already satisfied: pandas in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (2.2.3) Requirement already satisfied: numpy>=1.23.2 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from pandas) (2.2.1) Requirement already satisfied: python-dateutil>=2.8.2 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from pandas) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from pandas) (2024.2) Requirement already satisfied: tzdata>=2022.7 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from pandas) (2024.2) Requirement already satisfied: six>=1.5 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from python-dateutil>=2.8.2->pandas) (1.17.0) [notice] A new release of pip is available: 23.2.1 -> 24.3.1 [notice] To update, run: pip install --upgrade pip Requirement already satisfied: matplotlib in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (3.10.0) Requirement already satisfied: contourpy>=1.0.1 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from matplotlib) (1.3.1) Requirement already satisfied: cycler>=0.10 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from matplotlib) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from matplotlib) (4.55.3) Requirement already satisfied: kiwisolver>=1.3.1 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from matplotlib) (1.4.8) Requirement already satisfied: numpy>=1.23 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from matplotlib) (2.2.1) Requirement already satisfied: packaging>=20.0 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from matplotlib) (24.2) Requirement already satisfied: pillow>=8 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from matplotlib) (11.1.0) Requirement already satisfied: pyparsing>=2.3.1 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from matplotlib) (3.2.1) Requirement already satisfied: python-dateutil>=2.7 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from matplotlib) (2.9.0.post0) Requirement already satisfied: six>=1.5 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from python-dateutil>=2.7->matplotlib) (1.17.0) [notice] A new release of pip is available: 23.2.1 -> 24.3.1 [notice] To update, run: pip install --upgrade pip Requirement already satisfied: scikit-learn in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (1.6.1) Requirement already satisfied: numpy>=1.19.5 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from scikit-learn) (2.2.1) Requirement already satisfied: scipy>=1.6.0 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from scikit-learn) (1.15.1) Requirement already satisfied: joblib>=1.2.0 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from scikit-learn) (1.4.2) Requirement already satisfied: threadpoolctl>=3.1.0 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from scikit-learn) (3.5.0) [notice] A new release of pip is available: 23.2.1 -> 24.3.1 [notice] To update, run: pip install --upgrade pip Requirement already satisfied: opencv-python in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (4.10.0.84) Requirement already satisfied: numpy>=1.21.2 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from opencv-python) (2.2.1) [notice] A new release of pip is available: 23.2.1 -> 24.3.1 [notice] To update, run: pip install --upgrade pip Requirement already satisfied: scipy in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (1.15.1) Requirement already satisfied: numpy<2.5,>=1.23.5 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from scipy) (2.2.1) [notice] A new release of pip is available: 23.2.1 -> 24.3.1 [notice] To update, run: pip install --upgrade pip Requirement already satisfied: torch in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (2.5.1) Requirement already satisfied: filelock in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from torch) (3.16.1) Requirement already satisfied: typing-extensions>=4.8.0 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from torch) (4.12.2) Requirement already satisfied: networkx in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from torch) (3.4.2) Requirement already satisfied: jinja2 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from torch) (3.1.5) Requirement already satisfied: fsspec in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from torch) (2024.12.0) Requirement already satisfied: sympy==1.13.1 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from torch) (1.13.1) Requirement already satisfied: mpmath<1.4,>=1.1.0 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from sympy==1.13.1->torch) (1.3.0) Requirement already satisfied: MarkupSafe>=2.0 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from jinja2->torch) (3.0.2) [notice] A new release of pip is available: 23.2.1 -> 24.3.1 [notice] To update, run: pip install --upgrade pip Requirement already satisfied: torchvision in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (0.20.1) Requirement already satisfied: numpy in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from torchvision) (2.2.1) Requirement already satisfied: torch==2.5.1 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from torchvision) (2.5.1) Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from torchvision) (11.1.0) Requirement already satisfied: filelock in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from torch==2.5.1->torchvision) (3.16.1) Requirement already satisfied: typing-extensions>=4.8.0 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from torch==2.5.1->torchvision) (4.12.2) Requirement already satisfied: networkx in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from torch==2.5.1->torchvision) (3.4.2) Requirement already satisfied: jinja2 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from torch==2.5.1->torchvision) (3.1.5) Requirement already satisfied: fsspec in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from torch==2.5.1->torchvision) (2024.12.0) Requirement already satisfied: sympy==1.13.1 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from torch==2.5.1->torchvision) (1.13.1) Requirement already satisfied: mpmath<1.4,>=1.1.0 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from sympy==1.13.1->torch==2.5.1->torchvision) (1.3.0) Requirement already satisfied: MarkupSafe>=2.0 in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from jinja2->torch==2.5.1->torchvision) (3.0.2) [notice] A new release of pip is available: 23.2.1 -> 24.3.1 [notice] To update, run: pip install --upgrade pip Requirement already satisfied: open3d-python in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (0.3.0.0) Requirement already satisfied: numpy in /Users/ahsanaftab/Semester-3/3D-Computer vision/project/.venv/lib/python3.11/site-packages (from open3d-python) (2.2.1) [notice] A new release of pip is available: 23.2.1 -> 24.3.1 [notice] To update, run: pip install --upgrade pip
Imports¶
Please note the following:
- These are all imports necessary to achieve the sample results.
- You may remove and/or add other libraries at your own convinience.
- Using library functions (from the given or other libraries) that bypass necessary computer vision tasks will not be recognized as 'solved'.
- E.g.: If you need to undistort an image to get to the next step of the solution and use the library function cv2.undistort(), then we will evaluate the undistortion step as 'failed'.
- E.g.: If you want to draw points in an image (to check your method or visualize in-between steps) and use the library function cv2.circle(), then there is no problem.
- E.g.: If you need to perform complex mathematical operations and use some numpy function, then there is no problem.
- E.g.: You do not like a provided utility function and find/know a library function that gives the same outputs from the same inputs, then there is no problem.
import os
os.sys.path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn
import cv2
import open3d as o3d
import scipy
import torch
import torchvision
Data¶
This should load all available data and also create some output directories. Feel free to rename variables or add additional directories as you see fit.
#Inputs
base_path = os.getcwd()
data_path = os.path.join(base_path, f"data")
img_path = os.path.join(data_path, 'images')
depth_path = os.path.join(data_path, 'depths')
print(f"The project's root path is '{base_path}'.")
print(f"Reading data from '{data_path}'.")
print(f"Image folder: '{img_path}'.")
print(f"Depth folder: '{depth_path}'.")
#Outputs
out_path = os.path.join(base_path, 'output')
ply_path = os.path.join(out_path, 'point_cloud')
os.makedirs(out_path, exist_ok=True)
os.makedirs(ply_path, exist_ok=True)
print(f"\nCreating directory '{out_path}'.")
print(f"Creating directory '{ply_path}'.")
#Load Data
camera_calibration = np.load(os.path.join(data_path, 'camera_calibration.npy'), allow_pickle=True)
camera_calibration = camera_calibration.item()#get dictionary from numpy array struct
given_features = np.load(os.path.join(data_path, 'given_features.npy'), allow_pickle=True)
camera_movement = np.load(os.path.join(data_path, 'camera_movement.npy'), allow_pickle=True)
print(f"\nCamera Calibration:")
for entry in camera_calibration.items():
print(f" {entry[0]}: {entry[1]}")
print(f"Camera Movement: {camera_movement.shape}")#[Cameras-1, 4, 4]
print(f"2D Features (Unsorted): {given_features.shape}")#[Camera_idx, Feature_idx, 2]
The project's root path is '/Users/ahsanaftab/Semester-3/3D-Computer vision/project/3D-Scene-Reconstruction'. Reading data from '/Users/ahsanaftab/Semester-3/3D-Computer vision/project/3D-Scene-Reconstruction/data'. Image folder: '/Users/ahsanaftab/Semester-3/3D-Computer vision/project/3D-Scene-Reconstruction/data/images'. Depth folder: '/Users/ahsanaftab/Semester-3/3D-Computer vision/project/3D-Scene-Reconstruction/data/depths'. Creating directory '/Users/ahsanaftab/Semester-3/3D-Computer vision/project/3D-Scene-Reconstruction/output'. Creating directory '/Users/ahsanaftab/Semester-3/3D-Computer vision/project/3D-Scene-Reconstruction/output/point_cloud'. Camera Calibration: distortion_param: [-0.1, 0.02, 0.0, 0.0, -0.01] image_height: 551 image_width: 881 principal_point: [275.0, 440.0] focal_length_mm: 25 sensor_width_mm: 35 pixel_ratio: 1.0 pixel_per_mm: 25.17142857142857 focal_length_px: 629.2857142857142 Camera Movement: (8, 4, 4) 2D Features (Unsorted): (9, 7, 2)
import glob
rgb_paths = sorted(glob.glob("data/images/*.png"))
depth_paths = sorted(glob.glob("data/depths/*.png"))
feature_data = np.load("data/given_features.npy")
output_folder = "output"
for i in range(len(rgb_paths)):
rgb_image = cv2.imread(rgb_paths[i])
depth_map = cv2.imread(depth_paths[i], cv2.IMREAD_GRAYSCALE)
features = feature_data[i]
plt.figure(figsize=(15, 5))
plt.subplot(1, 3, 1)
plt.imshow(cv2.cvtColor(rgb_image, cv2.COLOR_BGR2RGB))
plt.title(f"RGB Image {i}")
plt.subplot(1, 3, 2)
plt.imshow(depth_map, cmap="jet_r")
plt.colorbar()
plt.title(f"Depth Map {i}")
plt.subplot(1, 3, 3)
plt.imshow(cv2.cvtColor(rgb_image, cv2.COLOR_BGR2RGB))
for feature in features:
if feature[0] != -1: # invalid features: if feature is not visible in image, coordinates are [-1, -1]
plt.plot(feature[1], feature[0], 'ro', markersize=2)
plt.title(f"Features {i}")
plt.tight_layout()
plt.show()
Undistortion¶
focal_length_px = camera_calibration['focal_length_px']
principal_point = camera_calibration['principal_point']
coeffs = camera_calibration['distortion_param']
K = np.array([
[focal_length_px, 0, principal_point[1]],
[0, focal_length_px, principal_point[0]],
[0, 0, 1]
], dtype=np.float64)
print("Camera matrix K:")
print(K)
print("\nDistortion coefficients:")
print(coeffs)
Camera matrix K: [[629.28571429 0. 440. ] [ 0. 629.28571429 275. ] [ 0. 0. 1. ]] Distortion coefficients: [-0.1, 0.02, 0.0, 0.0, -0.01]
def undistort_image_inverse_sampling(image, file_name, K, distortion_params):
h, w = image.shape[:2]
f_x, f_y = K[0, 0], K[1, 1]
u_0, v_0 = K[0, 2], K[1, 2]
# empty output
undistorted = np.zeros_like(image)
# grid of undistorted coordinates
u_grid, v_grid = np.meshgrid(np.arange(w), np.arange(h))
undistorted_coords = np.stack([u_grid.ravel(), v_grid.ravel()], axis=-1)
distorted_coords = []
# for each coordinate, calculate corresponding distorted coordinates
for u, v in undistorted_coords:
# normalisation
x = (u - u_0) / f_x
y = (v - v_0) / f_y
# invert distortion model
x_dist, y_dist = x, y
r2 = x_dist**2 + y_dist**2
radial_distortion = 1 + distortion_params[0]*r2 + distortion_params[1]*r2**2 + distortion_params[4]*r2**3
tangential_x = 2 * distortion_params[2] * x_dist * y_dist + distortion_params[3] * (r2 + 2 * x_dist**2)
tangential_y = distortion_params[2] * (r2 + 2 * y_dist**2) + 2 * distortion_params[3] * x_dist * y_dist
x_dist = x * radial_distortion + tangential_x
y_dist = y * radial_distortion + tangential_y
# map distorted normalized coordinates back to pixels
u_dist = f_x * x_dist + u_0
v_dist = f_y * y_dist + v_0
distorted_coords.append((u_dist, v_dist))
distorted_coords = np.array(distorted_coords).reshape(h, w, 2)
# bilinear interpolation to sample distorted image
map_x = distorted_coords[:, :, 0].astype(np.float32)
map_y = distorted_coords[:, :, 1].astype(np.float32)
undistorted = cv2.remap(image, map_x, map_y, interpolation=cv2.INTER_LINEAR, borderMode=cv2.BORDER_CONSTANT)
os.makedirs(output_folder, exist_ok=True)
output_path = os.path.join(output_folder, file_name)
cv2.imwrite(output_path, undistorted)
print(f"Saved undistorted image to {output_path}")
return undistorted
for i in range(len(rgb_paths)):
distorted_image = cv2.imread(rgb_paths[i])
undistorted_image = undistort_image_inverse_sampling(distorted_image, f"{i}.png", K, coeffs)
plt.figure(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.imshow(cv2.cvtColor(distorted_image, cv2.COLOR_BGR2RGB))
plt.title("Distorted Image")
plt.subplot(1, 2, 2)
plt.imshow(cv2.cvtColor(undistorted_image, cv2.COLOR_BGR2RGB))
plt.title("Undistorted Image")
plt.show()
undistorted_images = sorted(glob.glob(output_folder+"/*.png"))
Saved undistorted image to output/0.png
Saved undistorted image to output/1.png
Saved undistorted image to output/2.png
Saved undistorted image to output/3.png
Saved undistorted image to output/4.png
Saved undistorted image to output/5.png
Saved undistorted image to output/6.png
Saved undistorted image to output/7.png
Saved undistorted image to output/8.png
Matching features¶
Data observation¶
print(feature_data)
[[[163 616] [431 593] [380 672] [164 660] [378 462] [280 422] [274 650]] [[192 722] [183 694] [398 444] [495 651] [495 446] [346 704] [282 376]] [[252 536] [384 621] [291 480] [157 501] [ -1 -1] [ -1 -1] [157 540]] [[ -1 -1] [199 614] [ -1 -1] [307 657] [195 658] [409 690] [285 422]] [[168 693] [474 681] [ -1 -1] [285 374] [157 722] [322 722] [473 446]] [[285 481] [207 500] [ -1 -1] [ -1 -1] [207 539] [ -1 -1] [303 539]] [[187 650] [448 574] [284 636] [386 649] [313 424] [177 610] [383 460]] [[ -1 -1] [ -1 -1] [ -1 -1] [ -1 -1] [444 367] [ -1 -1] [505 444]] [[472 485] [350 602] [473 631] [259 554] [ -1 -1] [ -1 -1] [ -1 -1]]]
for i in range(len(undistorted_images)):
rgb_image = cv2.imread(undistorted_images[i])
features = feature_data[i]
plt.imshow(cv2.cvtColor(rgb_image, cv2.COLOR_BGR2RGB))
for feature in features:
if feature[0] != -1: # invalid features: if feature is not visible in image, coordinates are [-1, -1]
plt.plot(feature[1], feature[0], 'ro', markersize=2)
plt.title(f"Features {i}")
plt.tight_layout()
plt.show()
print(camera_movement)
[[[ 7.07106781e-01 -1.83012702e-01 6.83012702e-01 -3.00000000e+00] [ 1.83012702e-01 9.80379875e-01 7.32233047e-02 -2.58819045e-01] [-6.83012702e-01 7.32233047e-02 7.26726907e-01 9.65925826e-01] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]] [[ 6.12323400e-17 2.58819045e-01 -9.65925826e-01 4.00000000e+00] [-2.58819045e-01 9.33012702e-01 2.50000000e-01 -1.03527618e+00] [ 9.65925826e-01 2.50000000e-01 6.69872981e-02 3.86370331e+00] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]] [[ 7.07106781e-01 -1.83012702e-01 6.83012702e-01 -2.82842712e+00] [ 1.29893408e-16 9.65925826e-01 2.58819045e-01 -1.00000000e+00] [-7.07106781e-01 -1.83012702e-01 6.83012702e-01 1.41421356e+00] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]] [[ 7.07106781e-01 -1.29893408e-16 7.07106781e-01 -3.00000000e+00] [ 1.29893408e-16 1.00000000e+00 5.38036114e-17 -2.22044605e-16] [-7.07106781e-01 5.38036114e-17 7.07106781e-01 1.00000000e+00] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]] [[ 6.12323400e-17 1.83697020e-16 -1.00000000e+00 4.00000000e+00] [-1.83697020e-16 1.00000000e+00 1.83697020e-16 -7.77156117e-16] [ 1.00000000e+00 1.83697020e-16 6.12323400e-17 4.00000000e+00] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]] [[ 7.07106781e-01 -2.08398031e-16 7.07106781e-01 -2.82842712e+00] [ 2.98836239e-01 9.06307787e-01 -2.98836239e-01 1.21494310e+00] [-6.40856382e-01 4.22618262e-01 6.40856382e-01 2.12694929e+00] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]] [[ 7.07106781e-01 -2.98836239e-01 6.40856382e-01 -3.00000000e+00] [ 5.00000000e-01 8.52165513e-01 -1.54317655e-01 1.41421356e+00] [-5.00000000e-01 4.29547251e-01 7.51990132e-01 0.00000000e+00] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]] [[ 6.12323400e-17 7.07106781e-01 -7.07106781e-01 2.00000000e+00] [-7.07106781e-01 5.00000000e-01 5.00000000e-01 -1.41421356e+00] [ 7.07106781e-01 5.00000000e-01 5.00000000e-01 1.41421356e+00] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]]]
Camera transformations visualisation¶
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
def look_at(eye, target, up):
eye = np.array(eye, dtype=np.float64)
target = np.array(target, dtype=np.float64)
up = np.array(up, dtype=np.float64)
forward = target - eye
forward /= np.linalg.norm(forward)
right = np.cross(up, forward)
right /= np.linalg.norm(right)
up = np.cross(forward, right)
rotation_matrix = np.eye(4)
rotation_matrix[:3, 0] = right
rotation_matrix[:3, 1] = up
rotation_matrix[:3, 2] = -forward
translation_matrix = np.eye(4)
translation_matrix[:3, 3] = -eye
return rotation_matrix @ translation_matrix
# Initial position
eye = [0, 0, 0]
target = [1, 1, 0]
up = [0, 0, 1]
initial_pose = look_at(eye, target, up)
camera_poses = [initial_pose]
current_pose = initial_pose.copy()
for transform in camera_movement:
current_pose = current_pose @ transform
camera_poses.append(current_pose)
camera_positions = np.array([pose[:3, 3] for pose in camera_poses])
camera_rotations = np.array([pose[:3, :3] for pose in camera_poses])
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(camera_positions[:, 0], camera_positions[:, 1], camera_positions[:, 2], c='r')
for i, (pose, R) in enumerate(zip(camera_positions, camera_rotations)):
x_axis = R[:, 0]
y_axis = R[:, 1]
z_axis = R[:, 2]
start = pose
length = 0.5
ax.quiver(start[0], start[1], start[2], x_axis[0], x_axis[1], x_axis[2], color='r', length=length)
ax.quiver(start[0], start[1], start[2], y_axis[0], y_axis[1], y_axis[2], color='g', length=length)
ax.quiver(start[0], start[1], start[2], z_axis[0], z_axis[1], z_axis[2], color='b', length=length)
ax.text(start[0], start[1], start[2], f' {i}', None)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
plt.show()
Feature matching¶
Computing fundamental matrices¶
def get_essential_matrix(R, t):
"""
Computes essential matrix E (3x3) from rotation matrix (3x3) and translation vector (3,).
"""
# Skew-symmetric matrix for t
t_skew = np.array([
[0, -t[2], t[1]],
[t[2], 0, -t[0]],
[-t[1], t[0], 0]
])
# E = [t]_x R
E = t_skew @ R
return E
def get_fundamental_matrix(E, K):
"""
Computes fundamental matrix F (3x3) from essential matrix E (3x3) and camera matrix K (3x3).
"""
# F = K^-T E K^-1
F = np.linalg.inv(K).T @ E @ np.linalg.inv(K)
# Normalisation
F = F / F[2, 2]
return F
num_images = len(undistorted_images)
F = []
for i in range(num_images-1):
R = camera_movement[i][:3, :3]
t = camera_movement[i][:3, 3]
E = get_essential_matrix(R, t)
F.append(get_fundamental_matrix(E, K))
print(F)
[array([[-4.28363670e-22, 9.93913280e-06, -1.05735875e-03],
[ 1.40560564e-05, -4.41360389e-07, -2.44523175e-02],
[-1.49533109e-03, 1.50993082e-02, 1.00000000e+00]]), array([[ 6.96713717e-22, 1.06817758e-05, -1.13636364e-03],
[ 1.06817758e-05, -6.13874952e-22, 2.25903012e-03],
[-1.13636364e-03, -1.16589928e-02, 1.00000000e+00]]), array([[-6.56430105e-06, 1.09822897e-05, 5.99649959e-03],
[ 9.28332357e-06, 7.20810283e-06, -2.29953094e-02],
[-3.79544239e-03, 1.02149656e-02, 1.00000000e+00]]), array([[-1.03326556e-22, 3.81056112e-06, -1.04790431e-03],
[ 5.38894722e-06, -1.20099078e-22, -9.15351177e-03],
[-1.48196048e-03, 5.51714814e-03, 1.00000000e+00]]), array([[ 4.37686340e-23, 4.13223140e-06, -1.13636364e-03],
[ 4.13223140e-06, -0.00000000e+00, 7.82172373e-04],
[-1.13636364e-03, -4.41853601e-03, 1.00000000e+00]]), array([[ 3.16127319e-06, 3.16127319e-06, -4.24965438e-03],
[ 6.89910318e-07, -2.67202356e-06, -4.23413869e-03],
[ 8.16760564e-04, 2.94976229e-03, 1.00000000e+00]]), array([[-1.31878981e-05, 1.13296507e-05, 1.51684870e-02],
[-2.79757564e-05, 2.40338186e-05, 3.21772201e-02],
[-1.58452316e-02, -3.66386001e-02, 1.00000000e+00]]), array([[-4.57578416e-22, -3.20747801e-06, -1.13636364e-03],
[-3.20747801e-06, -2.51801851e-22, -1.44318674e-03],
[-1.13636364e-03, 4.26576739e-03, 1.00000000e+00]])]
Projecting epipolar lines¶
colors = ['r', 'g', 'b', 'c', 'm', 'y', 'k'] # Matplotlib color codes
for i in range(num_images-1):
img1 = cv2.cvtColor(cv2.imread(undistorted_images[i]), cv2.COLOR_BGR2RGB)
img2 = cv2.cvtColor(cv2.imread(undistorted_images[i+1]), cv2.COLOR_BGR2RGB)
fig, axs = plt.subplots(1, 2, figsize=(10, 5))
# Features in first image
axs[0].imshow(img1)
k = 0
for p1 in feature_data[i]:
if p1[0] == -1:
continue
axs[0].plot(p1[1], p1[0], colors[k] + 'o', markersize=3) # height is x, width is y
k += 1
axs[1].imshow(img2)
# Epipolar lines from features in second image using respective colours
k = 0
for p1 in feature_data[i]:
if p1[0] == -1:
continue
x1 = np.array([p1[1], p1[0], 1]) # homogeneous coordinates
l2 = F[i] @ x1 # epipolar line
l2 = [l2[0]/l2[2], l2[1]/l2[2], 1.0]
h, w = img2.shape[:2]
x0, y0 = 0, -l2[2] / l2[1]
x1, y1 = h, -(l2[2] + l2[0] * h) / l2[1]
axs[1].axline((x1, y1), (x0, y0), color=colors[k])
k += 1
# Features in second image
for p1 in feature_data[i+1]:
if p1[0] == -1:
continue
axs[1].plot(p1[1], p1[0], 'ro', markersize=3)
plt.show()
Finding correspondences between consecutive images¶
num_features = len(feature_data[0])
def match_feature_(feature, features_to_match, F, distance_threshold=2):
"""
Matches feature to that of another view.
"""
if feature[0] == -1:
return -1
x1 = np.array([feature[1], feature[0], 1])
epipolar_line = F @ x1
epipolar_line = [epipolar_line[0]/epipolar_line[2], epipolar_line[1]/epipolar_line[2], 1.0]
best_match = -1
min_distance = float('inf')
for l in range(len(features_to_match)):
if features_to_match[l, 0] != -1:
x2 = np.array([features_to_match[l, 1], features_to_match[l, 0], 1])
distance = np.abs(epipolar_line[0] * x2[0] + epipolar_line[1] * x2[1] + epipolar_line[2]) / np.sqrt(epipolar_line[0]**2 + epipolar_line[1]**2)
if distance < min_distance:
min_distance = distance
best_match = l
if min_distance < distance_threshold:
return best_match
return -1
matches = []
for i in range(num_images-1):
matches_i = []
for j in range(num_features):
match = match_feature_(feature_data[i, j], feature_data[i+1], F[i])
if match != -1:
matches_i.append((j, match))
matches.append(matches_i)
print(f"{i}->{i+1}", matches_i)
0->1 [(0, 1), (1, 4), (2, 3), (3, 0), (4, 2), (5, 6), (6, 5)] 1->2 [(0, 6), (1, 3), (4, 1), (5, 0), (6, 2)] 2->3 [(0, 3), (2, 6), (3, 1), (6, 4)] 3->4 [(1, 0), (3, 5), (4, 4), (5, 1), (6, 3)] 4->5 [(0, 1), (3, 0), (4, 4), (5, 6)] 5->6 [(0, 4), (1, 5), (4, 0), (6, 2)] 6->7 [(4, 4), (6, 6)] 7->8 [(4, 0)]
for i in range(num_images-1):
img1 = cv2.cvtColor(cv2.imread(undistorted_images[i]), cv2.COLOR_BGR2RGB)
img2 = cv2.cvtColor(cv2.imread(undistorted_images[i+1]), cv2.COLOR_BGR2RGB)
fig, axs = plt.subplots(1, 2, figsize=(10, 5))
axs[0].imshow(img1)
for j, (match1, match2) in enumerate(matches[i]):
u1, v1 = feature_data[i, match1]
axs[0].plot(v1, u1, colors[j] + 'o', markersize=3) # Note: v1 is x, u1 is y for plotting
axs[1].imshow(img2)
for j, (match1, match2) in enumerate(matches[i]):
u2, v2 = feature_data[i+1, match2]
axs[1].plot(v2, u2, colors[j] + 'o', markersize=3) # Note: v2 is x, u2 is y
axs[0].set_title(f"Image {i}")
axs[1].set_title(f"Image {i+1}")
plt.tight_layout()
plt.show()
Visualising all correspondences¶
def get_transformation_matrices(view_idx):
"""
Calculates transformation matrices from the given view to all views
"""
pose = np.eye(4)
transformations = [pose]
for i in range(num_images-1 - view_idx):
pose = camera_movement[i+view_idx] @ pose
transformations.append(pose)
pose = np.eye(4)
for i in range(view_idx):
pose = np.linalg.inv(camera_movement[view_idx-1-i]) @ pose
transformations.insert(0, pose)
return transformations
def show_features_from_view(idx):
transformations = get_transformation_matrices(idx)
for i in range(num_images):
if i == idx:
continue
img1 = cv2.cvtColor(cv2.imread(undistorted_images[idx]), cv2.COLOR_BGR2RGB)
img2 = cv2.cvtColor(cv2.imread(undistorted_images[i]), cv2.COLOR_BGR2RGB)
fig, axs = plt.subplots(1, 2, figsize=(10, 5))
axs[0].imshow(img1)
k = 0
for p1 in feature_data[idx]:
if p1[0] == -1:
continue
axs[0].plot(p1[1], p1[0], colors[k] + 'o', markersize=3) # height is x, width is y
k += 1
R = transformations[i][:3, :3]
t = transformations[i][:3, 3]
E = get_essential_matrix(R, t)
F = get_fundamental_matrix(E, K)
axs[1].imshow(img2)
k = 0
bad_pair = False
for p1 in feature_data[idx]:
if p1[0] == -1:
continue
x1 = np.array([p1[1], p1[0], 1])
l2 = F @ x1
l2 = [l2[0]/l2[2], l2[1]/l2[2], 1.0]
h, w = img2.shape[:2]
x0, y0 = 0, -l2[2] / l2[1]
x1, y1 = h, -(l2[2] + l2[0] * h) / l2[1]
axs[1].axline((x1, y1), (x0, y0), color=colors[k])
k += 1
if abs(y0) > (h+w) * 2 or abs(y1) > (h+w) * 2:
bad_pair = True
break
if bad_pair == True:
print(f"! ! ! BAD VIEWS between images {idx} and {i}")
continue
for p1 in feature_data[i]:
if p1[0] == -1:
continue
axs[1].plot(p1[1], p1[0], 'ro', markersize=3)
axs[0].set_title(f"Image {idx}")
axs[1].set_title(f"Image {i}")
plt.show()
for i in range(num_images):
show_features_from_view(i)
! ! ! BAD VIEWS between images 0 and 3
! ! ! BAD VIEWS between images 0 and 6
! ! ! BAD VIEWS between images 1 and 4
! ! ! BAD VIEWS between images 1 and 7
! ! ! BAD VIEWS between images 2 and 5
! ! ! BAD VIEWS between images 2 and 8 ! ! ! BAD VIEWS between images 3 and 0
! ! ! BAD VIEWS between images 3 and 6
! ! ! BAD VIEWS between images 4 and 1
! ! ! BAD VIEWS between images 4 and 7
! ! ! BAD VIEWS between images 5 and 2
! ! ! BAD VIEWS between images 5 and 8 ! ! ! BAD VIEWS between images 6 and 0
! ! ! BAD VIEWS between images 6 and 3
! ! ! BAD VIEWS between images 7 and 1
! ! ! BAD VIEWS between images 7 and 4
! ! ! BAD VIEWS between images 8 and 2
! ! ! BAD VIEWS between images 8 and 5
Building correspondence matrix¶
feature_correspondence_matrix = []
features_matched = []
for i in range(num_images):
feature = []
matches = []
for j in range(num_features):
feature.append([-1, -1])
matches.append(False)
feature_correspondence_matrix.append(feature)
features_matched.append(matches)
for i in range(num_features):
feature_correspondence_matrix[0][i] = feature_data[0][i]
features_matched[0][i] = True
h, w = img2.shape[:2]
ambig_param = 1.1
def match_feature(feature, features_to_match, F, distance_threshold=5):
"""
Matches feature to that of another view, or returns (1, [-1, -1]) if feature was not found, (0, [-1, -1]) and (0, [0, 0]) in case of bad view or ambiguity respectively.
"""
if feature[0] == -1:
return (1, [-1, -1])
x1 = np.array([feature[1], feature[0], 1])
epipolar_line = F @ x1
epipolar_line = [epipolar_line[0]/epipolar_line[2], epipolar_line[1]/epipolar_line[2], 1.0]
x0, y0 = 0, -epipolar_line[2] / epipolar_line[1]
x1, y1 = h, -(epipolar_line[2] + epipolar_line[0] * h) / epipolar_line[1]
if abs(y0) > (h+w) * 2 or abs(y1) > (h+w) * 2:
return (0, [-1, -1])
best_match = -1
min_distance = float('inf')
ambiguity = False
for l in range(len(features_to_match)):
if features_to_match[l, 0] != -1:
x2 = np.array([features_to_match[l, 1], features_to_match[l, 0], 1])
distance = np.abs(epipolar_line[0] * x2[0] + epipolar_line[1] * x2[1] + epipolar_line[2]) / np.sqrt(epipolar_line[0]**2 + epipolar_line[1]**2)
if abs(distance-min_distance) < ambig_param:
ambiguity = True
if distance < min_distance:
if abs(distance-min_distance) >= ambig_param:
ambiguity = False
min_distance = distance
best_match = l
if ambiguity:
return (0, [0, 0])
if min_distance < distance_threshold:
return (1, features_to_match[best_match])
return (1, [-1, -1])
def find_missing_correspondences(view_idx):
transformations = get_transformation_matrices(view_idx)
for i in range(num_images):
if i == view_idx:
continue
images_matched = True
for j in range(num_features):
if features_matched[i][j] == False:
images_matched = False
if images_matched:
continue
R = transformations[i][:3, :3]
t = transformations[i][:3, 3]
E = get_essential_matrix(R, t)
F = get_fundamental_matrix(E, K)
correspondences = []
matched = []
bad_view = False
for j in range(num_features):
feature = match_feature(feature_correspondence_matrix[view_idx][j], feature_data[i], F)
if feature[0] == 0 and feature[1][0] == -1:
bad_view = True
break
elif feature[0] == 0:
correspondences.append([-1, -1])
matched.append(False)
else:
correspondences.append(feature[1])
matched.append(True)
if bad_view:
continue
for j in range(num_features):
if features_matched[i][j] == False:
feature_correspondence_matrix[i][j] = correspondences[j]
features_matched[i][j] = matched[j]
done = True
for i in range(num_images):
find_missing_correspondences(i)
done = True
for k in range(num_images):
for j in range(num_features):
if features_matched[k][j] == False:
done = False
break
if done:
break
for i in range(len(feature_correspondence_matrix)):
print(feature_correspondence_matrix[i])
[array([163, 616], dtype=int32), array([431, 593], dtype=int32), array([380, 672], dtype=int32), array([164, 660], dtype=int32), array([378, 462], dtype=int32), array([280, 422], dtype=int32), array([274, 650], dtype=int32)] [array([183, 694], dtype=int32), array([495, 446], dtype=int32), array([495, 651], dtype=int32), array([192, 722], dtype=int32), array([398, 444], dtype=int32), array([282, 376], dtype=int32), array([346, 704], dtype=int32)] [array([157, 501], dtype=int32), array([384, 621], dtype=int32), [-1, -1], array([157, 540], dtype=int32), [-1, -1], array([291, 480], dtype=int32), array([252, 536], dtype=int32)] [array([199, 614], dtype=int32), [-1, -1], array([409, 690], dtype=int32), array([195, 658], dtype=int32), [-1, -1], array([285, 422], dtype=int32), array([307, 657], dtype=int32)] [array([168, 693], dtype=int32), array([473, 446], dtype=int32), array([474, 681], dtype=int32), array([157, 722], dtype=int32), [-1, -1], array([285, 374], dtype=int32), array([322, 722], dtype=int32)] [array([207, 500], dtype=int32), [-1, -1], [-1, -1], array([207, 539], dtype=int32), [-1, -1], array([285, 481], dtype=int32), array([303, 539], dtype=int32)] [array([177, 610], dtype=int32), array([448, 574], dtype=int32), array([386, 649], dtype=int32), array([187, 650], dtype=int32), array([383, 460], dtype=int32), array([313, 424], dtype=int32), array([284, 636], dtype=int32)] [[-1, -1], [-1, -1], [-1, -1], [-1, -1], array([505, 444], dtype=int32), array([444, 367], dtype=int32), [-1, -1]] [[-1, -1], array([473, 631], dtype=int32), array([350, 602], dtype=int32), [-1, -1], [-1, -1], array([472, 485], dtype=int32), array([259, 554], dtype=int32)]
Finally, see the result¶
for i in range(num_images):
img = cv2.imread(undistorted_images[i])
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
k = 0
for feature in feature_correspondence_matrix[i]:
if feature[0] != -1:
plt.plot(feature[1], feature[0], colors[k] + 'o', markersize=5)
k += 1
plt.title(f"Image {i}")
plt.tight_layout()
plt.show()
Triangulation¶
def get_projection_matrix(K, view_index):
T = get_transformation_matrices(0)[view_index]
P = K @ T[:3]
return P
#Calculate all 3d points with multiview triangulation
def multiview_triangulation(features_matrix, projection_matrices):
triangulated_points = []
for feature_index, feature in enumerate(features_matrix):
A = []
for view_index, point in enumerate(feature):
if np.array(point).sum() == -2:
continue
P = projection_matrices[view_index]
#since points are (h,w) .ie (y,x)
y, x = point
A.append(x * P[2, :] - P[0, :]) # Row from x-coordinate
A.append(y * P[2, :] - P[1, :]) # Row from y-coordinate
A = np.array(A) # Convert to NumPy array
# Solve A * X = 0 using SVD
_, _, Vt = np.linalg.svd(A)
X_homogeneous = Vt[-1] # Last row of Vt corresponds to smallest singular value
# Convert to Euclidean coordinates
X_euclidean = X_homogeneous[:-1] / X_homogeneous[-1]
triangulated_points.append(X_euclidean)
return triangulated_points
# Simulated 2D points from correspondence matrix and transposing shape from (9,7,2) to (7,9,2)
points_2d_all_features = np.array(feature_correspondence_matrix)
points_2d_all_features = np.transpose(points_2d_all_features, axes=(1, 0, 2))
# projection matrices for 9 views
projection_matrices = [get_projection_matrix(K, x) for x in range(0,9)]
# Perform multiview triangulation
points_3d = multiview_triangulation(points_2d_all_features, projection_matrices)
# Print results
for i, point in enumerate(points_3d):
print(f"Feature {i}: {point}")
Feature 0: [ 1.35201537 -0.86230485 4.81112929] Feature 1: [0.90839178 0.92869221 3.73654327] Feature 2: [1.63211365 0.744573 4.43802616] Feature 3: [ 1.60153403 -0.8015058 4.58633108] Feature 4: [0.1593758 0.72950708 4.45851857] Feature 5: [-0.11231227 0.03941065 3.93977091] Feature 6: [ 1.59977506e+00 -4.41454716e-03 4.80063840e+00]
#checking reprojection from 3d to 2d
for i in range(0, 7):
a = np.hstack([points_3d[i], [1]])
a = projection_matrices[0] @ a
y, x, _ = a/a[-1]
print(x, y)
162.21231786277394 616.8408014973934 431.40464923232537 592.9857748052221 380.57602276517207 671.4240079881464 165.02622389449078 659.7448171223541 377.96433206211844 462.4946720013839 281.29492490859474 422.06075682657115 274.4213245758087 649.7045239324775
#Sparse reconstruction
def sparse_reconstruction(points_3d):
point_cloud = o3d.geometry.PointCloud()
point_cloud.points = o3d.utility.Vector3dVector(points_3d)
point_cloud, ind = point_cloud.remove_statistical_outlier(nb_neighbors=20, std_ratio=2.0)
return point_cloud
sparse_cloud = sparse_reconstruction(points_3d)
o3d.visualization.draw_geometries([sparse_cloud])
def validate_triangulated_points(points_3d, points_2d_all_features, projection_matrices):
"""
Validate triangulated 3D points by computing reprojection errors.
Args:
points_3d (list of numpy.ndarray): List of triangulated 3D points (N, 3).
points_2d_all_features (numpy.ndarray): Correspondence matrix (features in 2D across views).
projection_matrices (list of numpy.ndarray): List of 3x4 projection matrices.
Returns:
list: Reprojection errors for each feature.
"""
reprojection_errors = []
for feature_index, point_3d in enumerate(points_3d):
total_error = 0
for view_index, point_2d in enumerate(points_2d_all_features[feature_index]):
if np.array(point_2d).sum() == -2: # Skip invisible points
continue
point_3d_homogeneous = np.append(point_3d, 1) # Convert to homogeneous
reprojected_2d = projection_matrices[view_index] @ point_3d_homogeneous
reprojected_2d /= reprojected_2d[2] # Normalize by depth
error = np.linalg.norm(reprojected_2d[:2] - point_2d)
total_error += error
reprojection_errors.append(total_error)
return reprojection_errors
validate_triangulated_points(points_3d, points_2d_all_features, projection_matrices)
[np.float64(4205.211350852528), np.float64(1074.922959641321), np.float64(2051.879155461192), np.float64(4570.947531424286), np.float64(378.5354189274818), np.float64(1482.0943986012326), np.float64(3748.8880211216688)]
import open3d as o3d
def generate_sparse_point_cloud(points_3d):
"""
Generate a sparse point cloud from triangulated 3D points.
Args:
points_3d (numpy.ndarray): Triangulated 3D points (N, 3).
Returns:
o3d.geometry.PointCloud: Sparse point cloud.
"""
point_cloud = o3d.geometry.PointCloud()
point_cloud.points = o3d.utility.Vector3dVector(points_3d)
return point_cloud
# Generate and visualize sparse point cloud
sparse_cloud = generate_sparse_point_cloud(points_3d)
o3d.visualization.draw_geometries([sparse_cloud])
One-view reconstruction¶
depth_maps = []
images = []
for i in range(len(rgb_paths)):
rgb_image = cv2.imread(undistorted_images[i])
depth_map = cv2.imread(depth_paths[i], cv2.IMREAD_GRAYSCALE)
depth_maps.append(depth_map)
images.append(rgb_image)
Relate depth values to Z-coordinate¶
def get_depth_for_features(feature_correspondence_matrix, depth_maps):
depth_values = []
for image_index, feature_points in enumerate(feature_correspondence_matrix):
image_depth_values = []
depth_map = depth_maps[image_index]
for point in feature_points:
if np.array_equal(point, [-1, -1]):
image_depth_values.append(None)
else:
u, v = point
depth_value = depth_map[u, v]
image_depth_values.append(depth_value)
depth_values.append(image_depth_values)
return depth_values
depth_values = get_depth_for_features(feature_correspondence_matrix, depth_maps)
for image_index, image_depth_values in enumerate(depth_values):
print(f"Image {image_index}:")
for feature_index, depth in enumerate(image_depth_values):
print(f" Feature {feature_index}: Depth (0-255 scale) = {depth}")
Image 0: Feature 0: Depth (0-255 scale) = 204 Feature 1: Depth (0-255 scale) = 160 Feature 2: Depth (0-255 scale) = 190 Feature 3: Depth (0-255 scale) = 193 Feature 4: Depth (0-255 scale) = 190 Feature 5: Depth (0-255 scale) = 167 Feature 6: Depth (0-255 scale) = 204 Image 1: Feature 0: Depth (0-255 scale) = 163 Feature 1: Depth (0-255 scale) = 150 Feature 2: Depth (0-255 scale) = 150 Feature 3: Depth (0-255 scale) = 147 Feature 4: Depth (0-255 scale) = 200 Feature 5: Depth (0-255 scale) = 186 Feature 6: Depth (0-255 scale) = 160 Image 2: Feature 0: Depth (0-255 scale) = 237 Feature 1: Depth (0-255 scale) = 196 Feature 2: Depth (0-255 scale) = None Feature 3: Depth (0-255 scale) = 237 Feature 4: Depth (0-255 scale) = None Feature 5: Depth (0-255 scale) = 167 Feature 6: Depth (0-255 scale) = 247 Image 3: Feature 0: Depth (0-255 scale) = 200 Feature 1: Depth (0-255 scale) = None Feature 2: Depth (0-255 scale) = 165 Feature 3: Depth (0-255 scale) = 194 Feature 4: Depth (0-255 scale) = None Feature 5: Depth (0-255 scale) = 155 Feature 6: Depth (0-255 scale) = 190 Image 4: Feature 0: Depth (0-255 scale) = 161 Feature 1: Depth (0-255 scale) = 130 Feature 2: Depth (0-255 scale) = 128 Feature 3: Depth (0-255 scale) = 151 Feature 4: Depth (0-255 scale) = None Feature 5: Depth (0-255 scale) = 175 Feature 6: Depth (0-255 scale) = 149 Image 5: Feature 0: Depth (0-255 scale) = 249 Feature 1: Depth (0-255 scale) = None Feature 2: Depth (0-255 scale) = None Feature 3: Depth (0-255 scale) = 249 Feature 4: Depth (0-255 scale) = None Feature 5: Depth (0-255 scale) = 166 Feature 6: Depth (0-255 scale) = 249 Image 6: Feature 0: Depth (0-255 scale) = 208 Feature 1: Depth (0-255 scale) = 179 Feature 2: Depth (0-255 scale) = 206 Feature 3: Depth (0-255 scale) = 205 Feature 4: Depth (0-255 scale) = 207 Feature 5: Depth (0-255 scale) = 180 Feature 6: Depth (0-255 scale) = 214 Image 7: Feature 0: Depth (0-255 scale) = None Feature 1: Depth (0-255 scale) = None Feature 2: Depth (0-255 scale) = None Feature 3: Depth (0-255 scale) = None Feature 4: Depth (0-255 scale) = 234 Feature 5: Depth (0-255 scale) = 198 Feature 6: Depth (0-255 scale) = None Image 8: Feature 0: Depth (0-255 scale) = None Feature 1: Depth (0-255 scale) = 206 Feature 2: Depth (0-255 scale) = 242 Feature 3: Depth (0-255 scale) = None Feature 4: Depth (0-255 scale) = None Feature 5: Depth (0-255 scale) = 162 Feature 6: Depth (0-255 scale) = 231
def get_point_depth_in_view(point, view_idx):
depth_map = depth_maps[view_idx]
depth_value = depth_map[point[0], point[1]]
return depth_value
Generate pixel coordinate datasets for point cloud¶
dataset_height = range(1, 551, 2)
dataset_width = range(1, 881, 2)
dataset_coordinates = []
for i in range(len(dataset_height)):
for j in range(len(dataset_width)):
dataset_coordinates.append([dataset_height[i], dataset_width[j]])
#print(dataset_coordinates)
print(len(dataset_coordinates))
121000
def get_depths(points):
depths = []
for i in range(len(points)):
pixel_depth = get_point_depth_in_view(points[i], 0)
if pixel_depth == 255 or pixel_depth == 0:
depths.append(-1)
else:
depth = points_3d[0][2] * pixel_depth / get_point_depth_in_view(feature_correspondence_matrix[0][0], 0)
depths.append(depth)
return depths
dataset_z = get_depths(dataset_coordinates)
#print(dataset_z)
dataset_3d = []
for i in range(len(dataset_coordinates)):
z = dataset_z[i]
if (z == -1):
continue
x = (dataset_coordinates[i][0] - principal_point[1]) / focal_length_px * z
y = (dataset_coordinates[i][1] - principal_point[0]) / focal_length_px * z
dataset_3d.append([x, y, z])
#print(dataset_3d[:5])
Extract colours for the selected points¶
def get_point_colour_in_view(point, view_idx):
colour_value = images[0][point[0]][point[1]]
return colour_value
dataset_colours = []
for i in range(len(dataset_coordinates)):
z = dataset_z[i]
if (z == -1):
continue
colour = get_point_colour_in_view(dataset_coordinates[i], 0)
dataset_colours.append(colour)
Get mesh via utility function!¶
def ply_creator(input_3d, rgb_data, filename):
assert (input_3d.ndim == 2), "Pass 3d points as NumPointsX3 array "
text1 = """ply\nformat ascii 1.0"""
text2 = "element vertex " + str(input_3d.shape[0])
text3 = """property float x
property float y
property float z
property uchar red
property uchar green
property uchar blue
end_header"""
with open(filename, 'w') as ply_file:
ply_file.write(text1 + "\n")
ply_file.write(text2 + "\n")
ply_file.write(text3 + "\n")
for i in range(input_3d.shape[0]):
ply_file.write(f"{input_3d[i, 0]} {input_3d[i, 1]} {input_3d[i, 2]} {rgb_data[i][2]} {rgb_data[i][1]} {rgb_data[i][0]}\n")
dataset_3d = np.array(dataset_3d)
dataset_colours = np.array(dataset_colours)
ply_creator(dataset_3d, dataset_colours, 'sparse_reconstruction.ply')
Dense Reconstruction¶
proceed here¶
Load Depth Maps, Calibration Data and Camera poses¶
import cv2
import numpy as np
import os
def load_depth_maps(depth_folder, max_depth=10.0):
"""
Load and normalize depth maps from a folder.
Args:
depth_folder (str): Path to the folder containing depth images.
max_depth (float): Maximum depth value for scaling.
Returns:
list of numpy.ndarray: List of depth maps (real-world scale).
"""
depth_maps = []
for filename in sorted(os.listdir(depth_folder)):
if filename.endswith(".png"):
# Load depth map as grayscale
depth_map = cv2.imread(os.path.join(depth_folder, filename), cv2.IMREAD_GRAYSCALE)
# Scale depth map to real-world depth
depth_map = (depth_map.astype(np.float32) / 255.0) * max_depth
depth_maps.append(depth_map)
return depth_maps
# Example usage
depth_folder = "data/depths"
depth_maps = load_depth_maps(depth_folder)
depth_maps
[array([[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.],
...,
[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.]],
shape=(551, 881), dtype=float32),
array([[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.],
...,
[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.]],
shape=(551, 881), dtype=float32),
array([[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.],
...,
[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.]],
shape=(551, 881), dtype=float32),
array([[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.],
...,
[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.]],
shape=(551, 881), dtype=float32),
array([[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.],
...,
[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.]],
shape=(551, 881), dtype=float32),
array([[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.],
...,
[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.]],
shape=(551, 881), dtype=float32),
array([[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.],
...,
[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.],
[10., 10., 10., ..., 10., 10., 10.]],
shape=(551, 881), dtype=float32),
array([[10. , 10. , 10. , ..., 10. , 10. ,
10. ],
[10. , 10. , 10. , ..., 10. , 10. ,
10. ],
[10. , 10. , 10. , ..., 10. , 10. ,
10. ],
...,
[10. , 10. , 10. , ..., 4.5882354, 4.5882354,
4.5882354],
[10. , 10. , 10. , ..., 4.5882354, 4.5882354,
4.5882354],
[10. , 10. , 10. , ..., 4.5882354, 4.5882354,
4.5882354]], shape=(551, 881), dtype=float32),
array([[10. , 10. , 10. , ..., 10. , 10. ,
10. ],
[10. , 10. , 10. , ..., 10. , 10. ,
10. ],
[10. , 10. , 10. , ..., 10. , 10. ,
10. ],
...,
[ 3.8039217, 3.8039217, 3.8039217, ..., 10. , 10. ,
10. ],
[ 3.8039217, 3.8039217, 3.8039217, ..., 10. , 10. ,
10. ],
[ 3.8039217, 3.8039217, 3.8039217, ..., 10. , 10. ,
10. ]], shape=(551, 881), dtype=float32)]
# Extract calibration parameters
calibration_data = np.load("data/camera_calibration.npy", allow_pickle=True).item()
# Construct the intrinsic matrix
focal_length_px = calibration_data["focal_length_px"]
principal_point = calibration_data["principal_point"]
intrinsics = np.array([
[focal_length_px, 0, principal_point[0]],
[0, focal_length_px, principal_point[1]],
[0, 0, 1]
])
print("Constructed Intrinsic Matrix:\n", intrinsics)
Constructed Intrinsic Matrix: [[629.28571429 0. 275. ] [ 0. 629.28571429 440. ] [ 0. 0. 1. ]]
# Load camera poses from file
poses = np.load("data/camera_movement.npy")
print("Camera Poses Shape:", poses.shape) # Should be (N, 4, 4) for N views
Camera Poses Shape: (8, 4, 4)
Integrate Depth Maps¶
def integrate_depth_maps(depth_maps, intrinsics, poses):
"""
Integrate depth maps into a dense point cloud.
Args:
depth_maps (list of numpy.ndarray): Depth maps for each view.
intrinsics (numpy.ndarray): Camera intrinsic matrix.
poses (list of numpy.ndarray): Camera pose matrices.
Returns:
numpy.ndarray: Dense 3D points (N, 3).
"""
dense_points = []
fx, fy, cx, cy = intrinsics[0, 0], intrinsics[1, 1], intrinsics[0, 2], intrinsics[1, 2]
for depth_map, pose in zip(depth_maps, poses):
h, w = depth_map.shape
for y in range(h):
for x in range(w):
z = depth_map[y, x]
if z > 0: # Ignore invalid depth values
X_cam = np.array([(x - cx) * z / fx, (y - cy) * z / fy, z, 1])
X_world = np.dot(pose, X_cam)[:3]
dense_points.append(X_world)
return np.array(dense_points)
# Integrate depth maps into a dense cloud
dense_points = integrate_depth_maps(depth_maps, intrinsics, poses)
# Combine sparse and dense points
combined_points = np.vstack([np.asarray(sparse_cloud.points), dense_points])
from scipy.spatial import Delaunay
def generate_mesh(points_3d):
"""
Generate a surface mesh from 3D points using Delaunay triangulation.
Args:
points_3d (numpy.ndarray): 3D points (N, 3).
Returns:
tuple: (vertices, faces) of the generated mesh.
"""
points_2d = points_3d[:, :2]
tri = Delaunay(points_2d)
vertices = points_3d
faces = tri.simplices
return vertices, faces
# Generate mesh
vertices, faces = generate_mesh(combined_points)
def save_mesh_to_ply(vertices, faces, filename="dense_reconstruction.ply"):
"""
Save a mesh to a PLY file.
Args:
vertices (numpy.ndarray): Mesh vertices (N, 3).
faces (numpy.ndarray): Mesh faces (M, 3).
filename (str): Output filename.
"""
with open(filename, "w") as f:
f.write("ply\n")
f.write("format ascii 1.0\n")
f.write(f"element vertex {len(vertices)}\n")
f.write("property float x\n")
f.write("property float y\n")
f.write("property float z\n")
f.write(f"element face {len(faces)}\n")
f.write("property list uchar int vertex_indices\n")
f.write("end_header\n")
for vertex in vertices:
f.write(f"{vertex[0]} {vertex[1]} {vertex[2]}\n")
for face in faces:
f.write(f"3 {face[0]} {face[1]} {face[2]}\n")
# Save the mesh
save_mesh_to_ply(vertices, faces, "dense_reconstruction.ply")
print("Dense reconstruction saved as 'dense_reconstruction.ply'.")
Dense reconstruction saved as 'dense_reconstruction.ply'.